Skip to content

Conversation

@xwy-bit
Copy link

@xwy-bit xwy-bit commented Apr 23, 2025

VisuLogic provides a benchmark and training dataset to evaluate and enhance MLLMs' visual reasoning.

🌐 Homepage | 🏆 Leaderboard | 📖 Paper | 🤗 Benchmark | 💻 Eval Code | 🤗 Train Data | 💻 Train Code

📖 Introduction

VisuLogic is a newly designed benchmark aimed at evaluating the visual reasoning capabilities of Multi-modal Large Language Models (MLLMs), independent of textual reasoning processes. It features carefully constructed visual reasoning tasks spanning multiple categories, divided into six types based on required reasoning skills (e.g., Quantitative Reasoning, which involves understanding and deducing changes in the quantity of elements in images). Unlike existing benchmarks, VisuLogic is a challenging visual reasoning benchmark that is inherently difficult to articulate using language, providing a more rigorous evaluation of the visual reasoning capabilities of MLLMs. Most models score below 30% accuracy—only slightly above the 25% random baseline and far below the 51.4% achieved by humans—revealing significant gaps in visual reasoning.

🌟 Key Features

  • 🚀 Visuo-Logical Challenge
    The first benchmark to integrate visual perception with logical reasoning, enabling authentic multimodal evaluation. Most models score below 30% accuracy—only slightly above the 25% random baseline and far below the 51.4% achieved by humans—revealing significant gaps in visual reasoning.

  • 🛠️ Rigorous Design
    Includes 1,000 meticulously curated questions, spanning 6 domains and 23 subcategories, for comprehensive performance evaluation.

  • 📝 Anti-Linguistic Shortcut
    Designed to avoid linguistic reasoning, ensuring tasks rely on genuine visual reasoning rather than shortcuts.

  • 💡 RL Exploration
    We identify the RL technique as a promising direction for improving the visual reasoning capabilities of MLLMs. Through RL method, models reach SOTA in VisuLogic!

  • Fully Open-source
    We open-source all the evaluation code, training scripts, and datasets associated with this work to promote further research and innovation.

VisuLogic provides a benchmark and training dataset to evaluate and enhance MLLMs' visual reasoning.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant